BigQueryで「テーブル」及び「ビュー」の一覧を取得する
Google BigQueryにおいて、作成した要素を一覧情報として可視化したい、管理したいというケースがあるかと思います。(ちょうど今私がそんな思いに駆られておりました)
なので、その辺りの情報を、公式ドキュメントを中心にした形でまとめてみたいと思います。
目次
データセットの一覧情報
まずはおさらい的に「データセットの一覧」を取得する術について、こちらは下記のエントリにその内容をまとめています。
テーブルの一覧情報
「テーブル」に関する情報は主に以下のドキュメントにて「取得方法」を学ぶ事が出来ます。
テーブル一覧(基本情報)
データセット名.INFORMATION_SCHEMA.TABLESから得られる情報は以下。
- テーブルタイプ(table_type)
- テーブルがDML INSERTに対応しているか否か(is_insertable_info)
- 常にNoの値を取る項目:意味合いは不明(is_typed)
- テーブル作成時間(creation_time)
テーブルタイプについては、Google Cloud Storage経由で作成したテーブルもある(テーブル名にgcsと含まれているものが該当)のですが、EXTERNALとならずにBASE TABLEと判断されてしまっていますね...何でだろ?
$ bq query --nouse_legacy_sql \ 'SELECT * FROM cmbqdataset.INFORMATION_SCHEMA.TABLES ORDER BY table_type, table_name' Waiting on bqjob_rxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx_1 ... (0s) Current status: DONE +------------------+--------------+-------------------------------------+------------+--------------------+----------+---------------------+ | table_catalog | table_schema | table_name | table_type | is_insertable_into | is_typed | creation_time | +------------------+--------------+-------------------------------------+------------+--------------------+----------+---------------------+ | cm-xxxxxxxxxxxxx | cmbqdataset | bqt | BASE TABLE | YES | NO | 2020-03-16 23:21:12 | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_cli | BASE TABLE | YES | NO | 2020-03-16 10:56:04 | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_console | BASE TABLE | YES | NO | 2020-03-16 05:37:46 | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_gcs | BASE TABLE | YES | NO | 2020-03-16 16:15:52 | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_gcs_autodetect | BASE TABLE | YES | NO | 2020-03-16 17:02:08 | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_gcs_python | BASE TABLE | YES | NO | 2020-03-16 17:51:08 | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_localfile_autodetect | BASE TABLE | YES | NO | 2020-03-16 14:31:02 | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_localfile_def_schema | BASE TABLE | YES | NO | 2020-03-16 15:03:24 | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_via_python | BASE TABLE | YES | NO | 2020-03-16 15:34:50 | | cm-xxxxxxxxxxxxx | cmbqdataset | marvel_characters | BASE TABLE | YES | NO | 2020-03-13 09:21:08 | | cm-xxxxxxxxxxxxx | cmbqdataset | maxstrings | BASE TABLE | YES | NO | 2020-03-15 11:04:51 | | cm-xxxxxxxxxxxxx | cmbqdataset | ml_movies | BASE TABLE | YES | NO | 2020-03-16 12:06:40 | | cm-xxxxxxxxxxxxx | cmbqdataset | sample_table_1 | BASE TABLE | YES | NO | 2020-03-11 18:55:35 | | cm-xxxxxxxxxxxxx | cmbqdataset | shortlifetable | BASE TABLE | YES | NO | 2020-03-16 23:35:53 | | cm-xxxxxxxxxxxxx | cmbqdataset | t_sales | BASE TABLE | YES | NO | 2020-03-11 18:56:48 | | cm-xxxxxxxxxxxxx | cmbqdataset | marvel_characters_appearances_top30 | VIEW | NO | NO | 2020-03-17 02:03:45 | | cm-xxxxxxxxxxxxx | cmbqdataset | marvel_characters_by_cli | VIEW | NO | NO | 2020-03-17 04:09:44 | | cm-xxxxxxxxxxxxx | cmbqdataset | marvel_characters_by_python | VIEW | NO | NO | 2020-03-17 05:31:07 | | cm-xxxxxxxxxxxxx | cmbqdataset | my_shared_view | VIEW | NO | NO | 2020-03-17 04:23:31 | +------------------+--------------+-------------------------------------+------------+--------------------+----------+---------------------+
テーブル一覧(指定のテーブルにおけるオプション情報)
データセット名.INFORMATION_SCHEMA.TABLE_OPTIONSから得られる情報は以下。
- パーティション分割テーブルのすべてのパーティションのデフォルトの存続期間:日数(partition_expiration_days)
- テーブルのデフォルトの存続期間:日数(expiration_timestamp)
- テーブルの暗号化に使用される Cloud KMS 鍵の名前(kms_key_name)
- テーブルのわかりやすい名前(friendly_name)
- テーブルの説明(description)
- テーブルのラベルを表す STRUCT の配列(labels)
- テーブルに対するクエリでパーティション フィルタが必要かどうか(require_partition_filter)
$ bq query --nouse_legacy_sql \ 'SELECT * FROM cmbqdataset.INFORMATION_SCHEMA.TABLE_OPTIONS ORDER BY table_name, option_name' Waiting on bqjob_rxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx_1 ... (0s) Current status: DONE +------------------+--------------+-----------------------------------+---------------------------+-------------------------------+-----------------------------------------------------------------------+ | table_catalog | table_schema | table_name | option_name | option_type | option_value | +------------------+--------------+-----------------------------------+---------------------------+-------------------------------+-----------------------------------------------------------------------+ | cm-xxxxxxxxxxxxx | cmbqdataset | bqt | expiration_timestamp | TIMESTAMP | TIMESTAMP "2020-03-30T23:21:12.834Z" | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_cli | description | STRING | "a table that expires at 2021, with each partition living for 7 days" | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_cli | expiration_timestamp | TIMESTAMP | TIMESTAMP "2020-12-31T23:59:59.000Z" | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_cli | labels | ARRAY<STRUCT<STRING, STRING>> | [STRUCT("production_mode", "false")] | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_cli | partition_expiration_days | FLOAT64 | 7.0 | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_cli | require_partition_filter | BOOL | false | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_console | expiration_timestamp | TIMESTAMP | TIMESTAMP "2020-09-12T05:37:46.605Z" | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_console | require_partition_filter | BOOL | true | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_gcs | expiration_timestamp | TIMESTAMP | TIMESTAMP "2020-09-12T16:15:52.633Z" | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_gcs_autodetect | expiration_timestamp | TIMESTAMP | TIMESTAMP "2020-09-12T17:02:08.301Z" | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_localfile_autodetect | expiration_timestamp | TIMESTAMP | TIMESTAMP "2020-09-12T14:31:02.420Z" | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_localfile_def_schema | expiration_timestamp | TIMESTAMP | TIMESTAMP "2020-09-12T15:03:24.734Z" | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_via_python | expiration_timestamp | TIMESTAMP | TIMESTAMP "2020-09-12T15:34:50.953Z" | | cm-xxxxxxxxxxxxx | cmbqdataset | marvel_characters | expiration_timestamp | TIMESTAMP | TIMESTAMP "2020-09-09T09:21:08.025Z" | | cm-xxxxxxxxxxxxx | cmbqdataset | marvel_characters_by_cli | description | STRING | "マーベルの登場回数 TOP30" | | cm-xxxxxxxxxxxxx | cmbqdataset | marvel_characters_by_cli | expiration_timestamp | TIMESTAMP | TIMESTAMP "2020-03-31T04:09:44.746Z" | | cm-xxxxxxxxxxxxx | cmbqdataset | marvel_characters_by_cli | labels | ARRAY<STRUCT<STRING, STRING>> | [STRUCT("production", "false")] | | cm-xxxxxxxxxxxxx | cmbqdataset | maxstrings | expiration_timestamp | TIMESTAMP | TIMESTAMP "2020-09-11T11:04:51.341Z" | | cm-xxxxxxxxxxxxx | cmbqdataset | ml_movies | expiration_timestamp | TIMESTAMP | TIMESTAMP "2020-09-12T12:06:40.406Z" | | cm-xxxxxxxxxxxxx | cmbqdataset | my_shared_view | expiration_timestamp | TIMESTAMP | TIMESTAMP "2020-03-31T04:23:31.288Z" | | cm-xxxxxxxxxxxxx | cmbqdataset | shortlifetable | expiration_timestamp | TIMESTAMP | TIMESTAMP "2020-03-30T23:35:53.487Z" | | cm-xxxxxxxxxxxxx | cmbqdataset | shortlifetable | require_partition_filter | BOOL | false | +------------------+--------------+-----------------------------------+---------------------------+-------------------------------+-----------------------------------------------------------------------+
対象テーブルのカラムスキーマ情報
データセット名.INFORMATION_SCHEMA.COLUMNSからは、対象テーブルの列における定義情報を得る事が出来ます。テーブル詳細情報を表示する時に使えるものですね。
$ bq query --nouse_legacy_sql \ 'SELECT * FROM cmbqdataset.INFORMATION_SCHEMA.COLUMNS WHERE table_schema = "cmbqdataset" AND table_name = "bqtable_from_cli" ORDER BY ordinal_position' Waiting on bqjob_rxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx_1 ... (0s) Current status: DONE +------------------+--------------+------------------+------------------+------------------+-------------+--------------------+--------------+-----------------------+-----------+-----------+--------------+-------------------+------------------------+-----------------------------+ | table_catalog | table_schema | table_name | column_name | ordinal_position | is_nullable | data_type | is_generated | generation_expression | is_stored | is_hidden | is_updatable | is_system_defined | is_partitioning_column | clustering_ordinal_position | +------------------+--------------+------------------+------------------+------------------+-------------+--------------------+--------------+-----------------------+-----------+-----------+--------------+-------------------+------------------------+-----------------------------+ | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_cli | user_id | 1 | NO | INT64 | NEVER | NULL | NULL | NO | NULL | NO | NO | 1 | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_cli | user_name | 2 | NO | STRING | NEVER | NULL | NULL | NO | NULL | NO | NO | NULL | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_cli | item_bytes | 3 | YES | BYTES | NEVER | NULL | NULL | NO | NULL | NO | NO | NULL | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_cli | value_float | 4 | YES | FLOAT64 | NEVER | NULL | NULL | NO | NULL | NO | NO | NULL | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_cli | value_numeric | 5 | YES | NUMERIC | NEVER | NULL | NULL | NO | NULL | NO | NO | NULL | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_cli | value_boolean | 6 | YES | BOOL | NEVER | NULL | NULL | NO | NULL | NO | NO | NULL | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_cli | birthday | 7 | NO | DATE | NEVER | NULL | NULL | NO | NULL | NO | NO | NULL | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_cli | create_timestamp | 8 | YES | TIMESTAMP | NEVER | NULL | NULL | NO | NULL | NO | YES | NULL | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_cli | create_datetime | 9 | YES | DATETIME | NEVER | NULL | NULL | NO | NULL | NO | NO | NULL | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_cli | create_time | 10 | YES | TIME | NEVER | NULL | NULL | NO | NULL | NO | NO | NULL | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_cli | item_geography | 11 | YES | GEOGRAPHY | NEVER | NULL | NULL | NO | NULL | NO | NO | NULL | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_cli | item_record | 12 | YES | STRUCT<key STRING> | NEVER | NULL | NULL | NO | NULL | NO | NO | NULL | +------------------+--------------+------------------+------------------+------------------+-------------+--------------------+--------------+-----------------------+-----------+-----------+--------------+-------------------+------------------------+-----------------------------+
テーブル毎の件数・サイズ情報
テーブルのメタ情報を管理しているデータセット名.__TABLES__をクエリすることで、テーブル毎の件数やサイズにおける情報を得る事が出来ます。
$ bq query --nouse_legacy_sql \ 'SELECT * FROM cmbqdataset.__TABLES__ ORDER BY table_id' Waiting on bqjob_rxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx_1 ... (0s) Current status: DONE +------------------+-------------+-------------------------------------+---------------+--------------------+-----------+------------+------+ | project_id | dataset_id | table_id | creation_time | last_modified_time | row_count | size_bytes | type | +------------------+-------------+-------------------------------------+---------------+--------------------+-----------+------------+------+ | cm-xxxxxxxxxxxxx | cmbqdataset | bqt | 1584400872834 | 1584400872848 | 0 | 0 | 1 | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_cli | 1584356164444 | 1584356164444 | 0 | 0 | 1 | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_console | 1584337066605 | 1584337103950 | 0 | 0 | 1 | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_gcs | 1584375352633 | 1584375352633 | 72165 | 3701768 | 1 | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_gcs_autodetect | 1584378128301 | 1584378128301 | 72165 | 3701768 | 1 | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_gcs_python | 1584381068625 | 1584381358018 | 72165 | 3701768 | 1 | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_localfile_autodetect | 1584369062420 | 1584369062420 | 62423 | 3177691 | 1 | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_from_localfile_def_schema | 1584371004734 | 1584378261078 | 134588 | 6879459 | 1 | | cm-xxxxxxxxxxxxx | cmbqdataset | bqtable_via_python | 1584372890953 | 1584372890953 | 62423 | 3177691 | 1 | | cm-xxxxxxxxxxxxx | cmbqdataset | marvel_characters | 1584091268025 | 1584091268025 | 173 | 27762 | 1 | | cm-xxxxxxxxxxxxx | cmbqdataset | marvel_characters_appearances_top30 | 1584410625082 | 1584411473343 | 0 | 0 | 2 | | cm-xxxxxxxxxxxxx | cmbqdataset | marvel_characters_by_cli | 1584418184746 | 1584418184837 | 0 | 0 | 2 | | cm-xxxxxxxxxxxxx | cmbqdataset | marvel_characters_by_python | 1584423067216 | 1584423067377 | 0 | 0 | 2 | | cm-xxxxxxxxxxxxx | cmbqdataset | maxstrings | 1584270291341 | 1584270572916 | 4 | 425043 | 1 | | cm-xxxxxxxxxxxxx | cmbqdataset | ml_movies | 1584360400406 | 1584360400406 | 62423 | 3177691 | 1 | | cm-xxxxxxxxxxxxx | cmbqdataset | my_shared_view | 1584419011288 | 1584419011418 | 0 | 0 | 2 | | cm-xxxxxxxxxxxxx | cmbqdataset | sample_table_1 | 1583952935240 | 1583952935411 | 0 | 0 | 1 | | cm-xxxxxxxxxxxxx | cmbqdataset | shortlifetable | 1584401753487 | 1584401753585 | 0 | 0 | 1 | | cm-xxxxxxxxxxxxx | cmbqdataset | t_sales | 1583953008700 | 1583966890527 | 15 | 435 | 1 | +------------------+-------------+-------------------------------------+---------------+--------------------+-----------+------------+------+
ビューの一覧情報
データセット名.INFORMATION_SCHEMA.VIEWSでは、対象データセット内に存在するビューの定義情報等を得る事が出来ます。(同じSQLを様々な方式で試したので、結果同じビュー定義が並ぶ形になってしまったのはご愛嬌ということで...)
$ bq query --nouse_legacy_sql 'SELECT * FROM cmbqdataset.INFORMATION_SCHEMA.VIEWS' Waiting on bqjob_rxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx_1 ... (0s) Current status: DONE +------------------+--------------+-------------------------------------+--------------------------------------------------+--------------+------------------+ | table_catalog | table_schema | table_name | view_definition | check_option | use_standard_sql | +------------------+--------------+-------------------------------------+--------------------------------------------------+--------------+------------------+ | cm-xxxxxxxxxxxxx | cmbqdataset | marvel_characters_appearances_top30 | SELECT | NULL | YES | | | | | name_alias, | | | | | | | appearances, | | | | | | | gender, | | | | | | | full_reserve_avengers_intro, | | | | | | | year, | | | | | | | years_since_joining, | | | | | | | honorary, | | | | | | | notes | | | | | | | FROM | | | | | | | cm-xxxxxxxxxxxxx.cmbqdataset.marvel_characters | | | | | | | ORDER BY | | | | | | | appearances DESC | | | | | | | LIMIT 30 | | | | cm-xxxxxxxxxxxxx | cmbqdataset | marvel_characters_by_python | | NULL | YES | | | | | SELECT | | | | | | | name_alias, | | | | | | | appearances, | | | | | | | gender, | | | | | | | full_reserve_avengers_intro, | | | | | | | year, | | | | | | | years_since_joining, | | | | | | | honorary, | | | | | | | notes | | | | | | | FROM | | | | | | | cm-xxxxxxxxxxxxx.cmbqdataset.marvel_characters | | | | | | | ORDER BY | | | | | | | appearances DESC | | | | | | | LIMIT 30 | | | | | | | | | | | cm-xxxxxxxxxxxxx | cmbqdataset | my_shared_view | | NULL | YES | | | | | SELECT | | | | | | | name_alias, | | | | | | | appearances, | | | | | | | gender, | | | | | | | full_reserve_avengers_intro, | | | | | | | year, | | | | | | | years_since_joining, | | | | | | | honorary, | | | | | | | notes | | | | | | | FROM | | | | | | | cm-xxxxxxxxxxxxx.cmbqdataset.marvel_characters | | | | | | | ORDER BY | | | | | | | appearances DESC | | | | | | | LIMIT 30 | | | | | | | | | | | cm-xxxxxxxxxxxxx | cmbqdataset | marvel_characters_by_cli | | NULL | YES | | | | | SELECT | | | | | | | name_alias, | | | | | | | appearances, | | | | | | | gender, | | | | | | | full_reserve_avengers_intro, | | | | | | | year, | | | | | | | years_since_joining, | | | | | | | honorary, | | | | | | | notes | | | | | | | FROM | | | | | | | cm-xxxxxxxxxxxxx.cmbqdataset.marvel_characters | | | | | | | ORDER BY | | | | | | | appearances DESC | | | | | | | LIMIT 30 | | | | | | | | | | +------------------+--------------+-------------------------------------+--------------------------------------------------+--------------+------------------+
まとめ
という訳で、Google BigQueryにおける各種リソースの一覧情報取得に関するまとめでした。
BigQueryを管理運営していく上で、こういったメタデータ情報をまとめて確認出来るのはとても大事です。適切な情報を適切な形で見られるようにしておくことで、効率良いデータ分析を進めていきたいものですね。